\section{Introduction}

There are more than 7000 languages in this world~\cite{ethnologue}, which fall into more than 140 genetic
families or have descended from a common ancestor. The aim of traditional historical linguistics is to trace
the evolutionary path, a tree, of extant languages to their extinct common ancestor. Genealogical relationship
is not the only characteristic which relates languages; languages can also share structurally common features
such as \emph{word order}, \emph{similar phoneme inventory size} and \emph{morphology}. For instance, Finnish
and Telugu are geographically remote and yet have a similar agglutinative morphology. It would be a grave error
to posit that these two languages are genetically related due to a single common structural feature. There have
been attempts in the past~\cite{nichols1995diachronically} to rank the stability of structural features. 
Stability implies the resistance of a structural feature to change across space and time. For instance, Dravidian
languages have adhered to subject-object-verb (SOV) word order for the last two thousand years
~\cite{krishnamurti2003dravidian,dunn84ger}. Hence, it can be claimed that the structural feature SOV is very
stable in Dravidian language family. Also, structural features have recently been used for inferring the evolutionary
tree of a small group of unrelated Oceanic languages~\cite{dunn2005structural}.

In the area of computational linguistics, genealogical distances have been shown to be a good feature to predict
the difficulty of machine translation~\cite{birch:2008}. However, the use of typological distances while developing
different tools for NLP largely remains unexplored. Typologically similar languages provide a useful leverage when working
with low-resource languages. 

% We are not aware of any other study, apart from the work of~\cite{pecina2010lexical} who, conducts an
% extensive empirical study of comparing 75 collocation measures for the task of collocation extraction from corpora.
In this paper, we investigate the effectiveness of sixteen vector similarity measures in computing typological
distances for the task of internal classification in language families and correlation within a family's lexical
divergence.

The paper is structured as followed. In Section~\ref{sec:related}, we summarize the related work. Section
~\ref{sec:database} describes the WALS dataset, lexical database and the criteria for preparing the final dataset.
Section~\ref{sec:measures} presents the different vector similarity measures and the evaluation procedure. The results
of our experiments are given in Section~\ref{sec:results}. We conclude the paper and discuss the future directions
in Section~\ref{sec:conclusions}.

